Introduction

Row

One of the national ambient air quality standards in the US concernces the long-term average level of fine particle pollution, also referred to as PM2.5. Here, the standard says that the “annual mean, averaged over 3 years” cannot exceed 12 micrograms per cubic meter.

The “avgpm25” dataset contains the annual mean PM2.5 averaged over the period 2008 through 2010.

Row

Research Question

Are there any counties in the US that exceed the national standard for fine particle pollution?

Data

Column

Glimpse of the Data

Rows: 571
Columns: 5
$ pm25      <dbl> 10.827805, 11.583928, 11.261996, 9.414423, 11.391494, 12.384…
$ fips      <int> 1069, 1073, 1089, 1097, 1103, 1113, 1117, 1121, 1125, 1127, …
$ region    <chr> "east", "east", "east", "east", "east", "east", "east", "eas…
$ longitude <dbl> -85.35039, -86.82805, -86.58823, -88.13967, -86.91892, -85.1…
$ latitude  <dbl> 31.18973, 33.52787, 34.73079, 30.72226, 34.50702, 32.37600, …

Column

Description

Each row contains the:

Boxplot

Column

Boxplot of PM2.5

Column

Analysis

The distribution excluding outliers appears to be roughly normal. There are more outliers above the upper fence, but there are still outliers below the lower fence.

Histogram

Row

Counties that Exceed the Standard

Although the current national ambient air quality standard is 12 micrograms per cubic meter, it used to be 15. The list of counties that exceeds the air quality standard of 15 micrograms per cubic meter are:
      pm25 fips region longitude latitude
1 16.19452 6019   west -119.9035 36.63837
2 15.80378 6029   west -118.6833 35.29602
3 18.44073 6031   west -119.8113 36.15514
4 16.66180 6037   west -118.2342 34.08851
5 15.01573 6047   west -120.6741 37.24578
6 17.42905 6065   west -116.8036 33.78331
7 16.25190 6099   west -120.9588 37.61380
8 16.18358 6107   west -119.1661 36.23465

Analysis of the Counties

All of the places with air quality standards that exceed 15 micrograms per cubic meter are located in the west. Also, based on the 6 in the first digit of the fips, all of the counties are located in California.

Row

Histogram of PM2.5

This histogram of the PM2.5 data shows a cutoff value located at the current standard of 12 micrograms per cubic meter.

Analysis of the Histogram

The majority of the data lies below the standard PM2.5 value of 12. The graph looks to be skewed right and bimodal. The median of the data appears to be around 10.

East vs. West

Column

Eastern vs Western US PM2.5 Levels

Column

Analysis of the Boxplots

The eastern region has less variable PM2.5 as compared to the western region. Both boxplots are symmetrical. The eastern region has outliers outside of the bottom whisker, which means they are outside the lower fence. The western region has outliers outside of the upper whisker, which means they are outside of the upper fence. The eastern region has a greater median than the western region.

Violin Plot

Column

East vs. West

Analysis of the Violin Plot

Eastern regions have a higher probability of having a PM2.5 around 10. Western regions have a higher probability of having a PM2.5 around 7.

Histogram by Region

Column

East vs. West

Analysis of the Histograms

The eastern distribution looks roughly normal but could be slightly skewed left. There also seems to be two modes based on the two tall peaks. The western region is skewed right and seems to be unimodal. The eastern region is also taller, which means that the data is more centered around a specific value.

Scatterplots

Column

PM2.5 by Latitude

Column

PM2.5 by region

Analysis

The eastern region appears to have a strong non-linear association between latitude and PM2.5. The western region appears to have little to no correlation between latitude and PM2.5.

Correlogram

Row

Row

Analysis

PM2.5 and latitude have a weak, negative, linear relationship. PM2.5 have a weak, postive, linear relationship. Latitude and longitude have a weak, negative, linear relationship.

---
title: "Investigation of US PM2.5 Levels"
output: 
  flexdashboard::flex_dashboard:
    theme: 
     version: 4
     bootswatch: pulse
     orientation: columns
    vertical_layout: fill
    source_code: embed 
---

```{r setup, include=FALSE}
library(flexdashboard)
library(DT)
library(tidyverse)
avgpm<-read.csv("C:/Users/write/OneDrive/Desktop/school/data m&m/in-class labs/avgpm25.csv")
```


Introduction
===

Row {data-height=500}
---
###
One of the national ambient air quality standards in the US concernces the long-term average level of fine particle pollution, also referred to as PM2.5. Here, the standard says that the "annual mean, averaged over 3 years" cannot exceed 12 micrograms per cubic meter. 

The "avgpm25" dataset contains the annual mean PM2.5 averaged over the period 2008 through 2010.

Row {data-height=500}
---
### Research Question
Are there any counties in the US that exceed the national standard for fine particle pollution? 

Data
===

Column {data-width=550}
---
### Glimpse of the Data
```{r}
glimpse(avgpm)
```

Column {data-width=450}
---
### Description
Each row contains the:

  - [a five-digit code indicating the county (fips)](https://transition.fcc.gov/oet/info/maps/census/fips/fips.txt#:~:text=FIPS%20codes%20are%20numbers%20which,to%20which%20the%20county%20belongs.)
  - the region of the country in which the country resides
  - the longitude of the centroid for that county
  - the latitude of the centroid for that county
  - the average PM2.5 level 
  
Boxplot
===

Column {data-width=550}
---
### Boxplot of PM2.5

```{r}
ggplot(avgpm,aes(x=pm25))+geom_boxplot(fill="lightblue")+labs(title="Distribution of PM2.5",x="PM2.5")
```


Column {data-width=450}
---
### Analysis 
The distribution excluding outliers appears to be roughly normal. There are more outliers above the upper fence, but there are still outliers below the lower fence.

Histogram
===

Row {data-height=350}
---
### Counties that Exceed the Standard
Although the current national ambient air quality standard is 12 micrograms per cubic meter, it used to be 15. The list of counties that exceeds the air quality standard of 15 micrograms per cubic meter are:
```{r}
filter(avgpm,pm25>15)
```

### Analysis of the Counties 
All of the places with air quality standards that exceed 15 micrograms per cubic meter are located in the west. Also, based on the 6 in the first digit of the fips, all of the counties are located in California.

Row {.tabset data-height=650}
---
### Histogram of PM2.5 
This histogram of the PM2.5 data shows a cutoff value located at the current standard of 12 micrograms per cubic meter. 


```{r}
ggplot(avgpm,aes(x=pm25))+geom_histogram(fill="lightblue")+geom_vline(xintercept=12)+geom_text(aes(x=12,y=65,label="standard PM2.5 = 12"))+labs(title="Distribution of PM2.5",x="PM2.5")
```

### Analysis of the Histogram
The majority of the data lies below the standard PM2.5 value of 12. The graph looks to be skewed right and bimodal. The median of the data appears to be around 10.

East vs. West
===

Column {data-width=500}
---
### Eastern vs Western US PM2.5 Levels

```{r}
ggplot(avgpm,aes(x=region,y=pm25))+geom_boxplot(fill="lightblue")+labs(title="Distribution of PM2.5 by Region",x="Region",y="PM2.5")
```


Column {data-width=500}
---
### Analysis of the Boxplots
The eastern region has less variable PM2.5 as compared to the western region. Both boxplots are symmetrical. The eastern region has outliers outside of the bottom whisker, which means they are outside the lower fence. The western region has outliers outside of the upper whisker, which means they are outside of the upper fence. The eastern region has a greater median than the western region.

Violin Plot
===

Column{.tabset data-width=500}
---
### East vs. West
```{r}
library(ggplot2)
ggplot(avgpm,aes(x=region,y=pm25))+geom_violin(fill="lightblue",trim=FALSE)+labs(title="Distribution of PM2.5 by Region",x="Region",y="PM2.5")
```

### Analysis of the Violin Plot
Eastern regions have a higher probability of having a PM2.5 around 10. Western regions have a higher probability of having a PM2.5 around 7. 

Histogram by Region
===

Column{.tabset data-width=500}
---
### East vs. West
```{r}
ggplot(avgpm,aes(x=pm25))+geom_histogram(fill="lightblue")+facet_wrap(~region)+labs(title="Distribution of PM2.5 by Region",x="PM2.5")+theme(text=element_text(size=20))
```

### Analysis of the Histograms
The eastern distribution looks roughly normal but could be slightly skewed left. There also seems to be two modes based on the two tall peaks. The western region is skewed right and seems to be unimodal. The eastern region is also taller, which means that the data is more centered around a specific value.

Scatterplots
===
Column {data.width=500}
---
### PM2.5 by Latitude 
```{r}
ggplot(avgpm,aes(x=pm25,y=latitude,color=region))+geom_point()+labs(title="Distribution of PM2.5 by Latitude",x="PM2.5",y="Latitude")
```

Column {.tabset data.width=500}
---
### PM2.5 by region
```{r}
ggplot(avgpm,aes(x=pm25,y=latitude))+geom_point(color="lightblue")+facet_wrap(~region)+
  labs(title="Distribution of PM2.5 by Latitude",x="PM2.5",y="Latitude")
```


### Analysis 
The eastern region appears to have a strong non-linear association between latitude and PM2.5. The western region appears to have little to no correlation between latitude and PM2.5.

Correlogram
===

Row {data.height=700}
---
###
```{r}
library(corrgram)
df<-select(avgpm,pm25,latitude,longitude)
corrgram(df)
```

Row {data.height=300}
---
### Analysis
PM2.5 and latitude have a weak, negative, linear relationship. PM2.5 have a weak, postive, linear relationship. Latitude and longitude have a weak, negative, linear relationship.